class: center, middle, inverse, title-slide # Data visualization with
ggplot2
### AECN 396/896-002 --- <style type="text/css"> .remark-slide-content.hljs-github h1 { margin-top: 5px; margin-bottom: 25px; } .remark-slide-content.hljs-github { padding-top: 10px; padding-left: 30px; padding-right: 30px; } .panel-tabs { <!-- color: #062A00; --> color: #841F27; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 0px; } .panel-tab { margin-top: 0px; margin-bottom: 0px; margin-left: 3px; margin-right: 3px; padding-top: 0px; padding-bottom: 0px; } .panelset .panel-tabs .panel-tab { min-height: 40px; } .remark-slide th { border-bottom: 1px solid #ddd; } .remark-slide thead { border-bottom: 0px; } .gt_footnote { padding: 2px; } .remark-slide table { border-collapse: collapse; } .remark-slide tbody { border-bottom: 2px solid #666; } .important { background-color: lightpink; border: 2px solid blue; font-weight: bold; } .remark-code { display: block; overflow-x: auto; padding: .5em; background: #ffe7e7; } .hljs-github .hljs { background: #f2f2fd; } .remark-inline-code { padding-top: 0px; padding-bottom: 0px; background-color: #e6e6e6; } .r.hljs.remark-code.remark-inline-code{ font-size: 0.9em } .left-full { width: 80%; float: left; } .left-code { width: 38%; height: 92%; float: left; } .right-plot { width: 60%; float: right; padding-left: 1%; } .left6 { width: 60%; height: 92%; float: left; } .left5 { width: 49%; <!-- height: 92%; --> float: left; } .right5 { width: 49%; float: right; padding-left: 1%; } .right4 { width: 39%; float: right; padding-left: 1%; } .left3 { width: 29%; height: 92%; float: left; } .right7 { width: 69%; float: right; padding-left: 1%; } .left4 { width: 38%; float: left; } .right6 { width: 60%; float: right; padding-left: 1%; } ul li{ margin: 7px; } ul, li{ margin-left: 15px; padding-left: 0px; } ol li{ margin: 7px; } ol, li{ margin-left: 15px; padding-left: 0px; } </style> <style type="text/css"> .content-box { box-sizing: border-box; background-color: #e2e2e2; } .content-box-blue, .content-box-gray, .content-box-grey, .content-box-army, .content-box-green, .content-box-purple, .content-box-red, .content-box-yellow { box-sizing: border-box; border-radius: 5px; margin: 0 0 10px; overflow: hidden; padding: 0px 5px 0px 5px; width: 100%; } .content-box-blue { background-color: #F0F8FF; } .content-box-gray { background-color: #e2e2e2; } .content-box-grey { background-color: #F5F5F5; } .content-box-army { background-color: #737a36; } .content-box-green { background-color: #d9edc2; } .content-box-purple { background-color: #e2e2f9; } .content-box-red { background-color: #ffcccc; } .content-box-yellow { background-color: #fef5c4; } .content-box-blue .remark-inline-code, .content-box-blue .remark-inline-code, .content-box-gray .remark-inline-code, .content-box-grey .remark-inline-code, .content-box-army .remark-inline-code, .content-box-green .remark-inline-code, .content-box-purple .remark-inline-code, .content-box-red .remark-inline-code, .content-box-yellow .remark-inline-code { background: none; } .full-width { display: flex; width: 100%; flex: 1 1 auto; } </style> <style type="text/css"> blockquote, .blockquote { display: block; margin-top: 0.1em; margin-bottom: 0.2em; margin-left: 5px; margin-right: 5px; border-left: solid 10px #0148A4; border-top: solid 2px #0148A4; border-bottom: solid 2px #0148A4; border-right: solid 2px #0148A4; box-shadow: 0 0 6px rgba(0,0,0,0.5); /* background-color: #e64626; */ color: #e64626; padding: 0.5em; -moz-border-radius: 5px; -webkit-border-radius: 5px; } .blockquote p { margin-top: 0px; margin-bottom: 5px; } .blockquote > h1:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h2:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h3:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h4:first-of-type { margin-top: 0px; margin-bottom: 5px; } .text-shadow { text-shadow: 0 0 4px #424242; } </style> <style type="text/css"> /****************** * Slide scrolling * (non-functional) * not sure if it is a good idea anyway slides > slide { overflow: scroll; padding: 5px 40px; } .scrollable-slide .remark-slide { height: 400px; overflow: scroll !important; } ******************/ .scroll-box-8 { height:8em; overflow-y: scroll; } .scroll-box-10 { height:10em; overflow-y: scroll; } .scroll-box-12 { height:12em; overflow-y: scroll; } .scroll-box-14 { height:14em; overflow-y: scroll; } .scroll-box-16 { height:16em; overflow-y: scroll; } .scroll-box-18 { height:18em; overflow-y: scroll; } .scroll-box-20 { height:20em; overflow-y: scroll; } .scroll-box-24 { height:24em; overflow-y: scroll; } .scroll-box-30 { height:30em; overflow-y: scroll; } .scroll-output { height: 90%; overflow-y: scroll; } </style> # Before you start ## Learning objectives The objectives of this chapter is to learn how to use the `ggplot2` package to create figures for effective communication ## Table of contents 1. [`ggplot2` basics](#ggplot2-basics) 2. [Different types of figures](#dif-geoms) 3. [Placing more information in one figure](#more-info) 4. [Faceted figures](#faceting) 5. [Other supplementary `geom_*()`](#other-geoms) <br> <span style="color:red"> Tips: </span>hitting letter "o" key will give you a panel view of the slides --- # `ggplot2` package .left-full[ Install the package if you have not. ```r install.packages("ggplot2") ``` When you load the `tidyverse` package, it automatically load it. ```r #--- load ggplot2 along with others in the tidyverse package ---# library(tidyverse) #--- or ---# *library(ggplot2) ``` ] --- # The datasets we use .panelset[ .panel[.panel-name[Instruction] Go [here](https://www.dropbox.com/sh/63rlp4ydmyjm1ui/AACYSeN0f_WAgKPQKzgpGVe0a?dl=0) and download **county_yield.rds** and then read the file onto R: ] .panel[.panel-name[R Code] ```r county_yield <- readRDS("county_yield.rds") %>% dplyr::select(soy_yield, corn_yield, year, county_code, state_name, d0_5_9, d1_5_9, d2_5_9, d3_5_9, d4_5_9) %>% filter(state_name %in% c("Nebraska", "Kansas", "Colorado")) ``` ] .panel[.panel-name[Output] ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 ## 1: NA NA 2018 053 Kansas 0.8980 3.8186 ## 2: NA NA 2017 053 Kansas 3.9994 7.0006 ## 3: NA NA 2016 053 Kansas 0.5724 0.0996 ## 4: NA NA 2015 053 Kansas 4.4283 1.6177 ## 5: NA NA 2014 053 Kansas 4.7032 9.9327 ## --- ## 2960: 53 181 2004 073 Nebraska 0.0000 3.2915 ## 2961: 57 195 2003 073 Nebraska 0.0000 7.7427 ## 2962: 51 170 2002 073 Nebraska 0.0000 7.0000 ## 2963: 56 195 2001 073 Nebraska 5.7915 0.0000 ## 2964: 54 147 2000 073 Nebraska 0.0000 4.7386 ## d2_5_9 d3_5_9 d4_5_9 ## 1: 13.5279 0.0000 0 ## 2: 0.0000 0.0000 0 ## 3: 0.0000 0.0000 0 ## 4: 0.0000 0.0000 0 ## 5: 3.5824 4.7817 0 ## --- ## 2960: 19.7085 0.0000 0 ## 2961: 11.8459 3.4114 0 ## 2962: 1.2978 4.7022 9 ## 2963: 0.0000 0.0000 0 ## 2964: 17.6887 0.5727 0 ``` ] .panel[.panel-name[Variable Definitions] + `soy_yield`: soybean yield (bu/acre) + `corn_yield`: corn yield (bu/acre) + `d0_5_9`: ratio of weeks under drought severity of 0 from May to September + `d1_5_9`: ~ drought severity of 1 from May to September + `d2_5_9`: ~ drought severity of 2 from May to September + `d3_5_9`: ~ drought severity of 3 from May to September + `d4_5_9`: ~ drought severity of 4 from May to September ] ] <!-- #========================================= # ggplot2 Basics #========================================= --> --- class: inverse, center, middle name: ggplot2-basics # `ggplot2` basics <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r *county_yield ``` ] .panel2-taste-auto[ ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 ## 1: NA NA 2018 053 Kansas 0.8980 3.8186 ## 2: NA NA 2017 053 Kansas 3.9994 7.0006 ## 3: NA NA 2016 053 Kansas 0.5724 0.0996 ## 4: NA NA 2015 053 Kansas 4.4283 1.6177 ## 5: NA NA 2014 053 Kansas 4.7032 9.9327 ## --- ## 2960: 53 181 2004 073 Nebraska 0.0000 3.2915 ## 2961: 57 195 2003 073 Nebraska 0.0000 7.7427 ## 2962: 51 170 2002 073 Nebraska 0.0000 7.0000 ## 2963: 56 195 2001 073 Nebraska 5.7915 0.0000 ## 2964: 54 147 2000 073 Nebraska 0.0000 4.7386 ## d2_5_9 d3_5_9 d4_5_9 ## 1: 13.5279 0.0000 0 ## 2: 0.0000 0.0000 0 ## 3: 0.0000 0.0000 0 ## 4: 0.0000 0.0000 0 ## 5: 3.5824 4.7817 0 ## --- ## 2960: 19.7085 0.0000 0 ## 2961: 11.8459 3.4114 0 ## 2962: 1.2978 4.7022 9 ## 2963: 0.0000 0.0000 0 ## 2964: 17.6887 0.5727 0 ``` ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% * ggplot(data = .) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + * aes(x = factor(year)) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + * aes(y = corn_yield) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + * geom_boxplot() ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + * aes(fill = state_name) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_06_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + * facet_grid(state_name ~ .) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_07_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + * xlab("Year") ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_08_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + * ylab("Corn Yield (bu/acre)") ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_09_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + * ylim(c(100, 200)) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_10_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + * scale_fill_viridis_d( * name = "State", * guide = guide_legend( * title.position = "left" * ) * ) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_11_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + * theme_bw() ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_12_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + * theme(axis.text.x = element_text(angle = 90, size = 6)) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_13_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + * theme(axis.text.y = element_text(size = 6)) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_14_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + * theme(legend.position = "bottom") ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_15_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + * theme( * legend.title = element_text(size = 6), * legend.text = element_text(size = 6) * ) ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_16_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + theme( legend.title = element_text(size = 6), legend.text = element_text(size = 6) ) + * labs(title = "Corn Yield (bu/acre) by State") ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_17_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + theme( legend.title = element_text(size = 6), legend.text = element_text(size = 6) ) + labs(title = "Corn Yield (bu/acre) by State") + * labs(caption = "Design: Taro Mieno") ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_18_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-auto[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + theme( legend.title = element_text(size = 6), legend.text = element_text(size = 6) ) + labs(title = "Corn Yield (bu/acre) by State") + labs(caption = "Design: Taro Mieno") + * labs(subtitle = "Data Source: USDA-NASS") ``` ] .panel2-taste-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/taste_auto_19_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-taste-auto { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-taste-auto { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-taste-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Basics .panelset[ .panel[.panel-name[Step 1] .left-full[ The very first job you need to do to create a figure using the `ggplot2` package is to let R know the dataset you are trying to visualize, which can be done using `ggplot()` as follows: ```r g_fig <- ggplot(data = county_yield) ``` When you create a figure using the `ggplot2` package, `ggplot()` is always the function you call first. ] ] .panel[.panel-name[g_fig] .left-code[ Let's now see what is inside `g_fig`: ```r g_fig ``` Well, it's blank. Obviously, `g_fig` still does not have enough information to create any kind of figures. You have not told R anything specific about how you would like to use the information in the dataset. ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Step 2] .left-full[ So, the next thing you need to do is tell `g_fig` what type of figure you want by `geom_*()` functions. For example, we use `geom_point()` to create a scatter plot. To create a scatter plot, R needs to know which variables should be on the y-axis and x-axis. These information can be passed to `g_fig` by the following code: ```r g_fig_scatter <- g_fig + geom_point(aes(x = d3_5_9, y = corn_yield)) ``` Here, + `geom_point()` was added to `g_fig` to declare that you want a scatter plot + `aes(x = d3_5_9, y = corn_yield)` inside `geom_point()` tells R that you want to create a scatter plot where you have `d3_5_9` on the x-axis and `corn_yield` on the y-axis ] ] .panel[.panel-name[Output] .left-code[ This is what `g_fig_scatter` looks: ```r g_fig_scatter ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/scatter-plot-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[`aes()`] .left-full[ Going back to the code, ```r g_fig_scatter <- g_fig + geom_point(aes(x = d3_5_9, y = corn_yield)) ``` Note that `x = d3_5_9`, `y = corn_yield` are inside `aes()`. <span style="color:red"> Important:</span> `aes()` is used to make the <span style='color:red'>aes</span>thetic of the figure to be a function of variables in the dataset that you told `ggplot` to use (here, `county_yield`). `aes(x = d3_5_9, y = corn_yield)` is telling `ggplot` to use `d3_5_9` and `corn_yield` variables in the `county_yield` dataset for the x-axis and y-axis, respectively. If you do not have `x = d3_5_9`, `y = corn_yield` inside `aes()`, R is going to look for `d3_5_9` and `corn_yield` themselves (but not in `county_yield`), which you have not defined. Try ```r g_fig + geom_point(x = d3_5_9, y = corn_yield) ``` ] ] .panel[.panel-name[summary] .left-full[ + `ggplot(data = dataset)` to initiate the process of creating a figure + add `geom_*()` to declare what kind of figure you would like to make + specify what variables in the dataset to use and how they are used inside `aes()` + place the `aes()` you defined above in the `geom_*()` you specified above ] ] ] <!-- #========================================= # Different types of figures #========================================= --> --- class: inverse, center, middle name: dif-geoms # Different types of figures <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Different types of figures .panelset[ .panel[.panel-name[Figure types] <br> `ggplot2` lets you create lots of different kinds of figures via various `geom_*()` functions. + `geom_histogram()`/`geom_density()` + `geom_line()` + `geom_boxplot()` + `geom_bar()` How to specify aesthetics vary by `geom_*()`. ] .panel[.panel-name[Histogram] .left-code[ ```r g_fig + geom_histogram( aes(x = corn_yield) ) ``` `geom_histogram()` only needs `x`. ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/hist-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Density Plot] .left-code[ ```r g_fig + geom_density( aes(x = corn_yield) ) ``` `geom_density()` only needs `x`. ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/density-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Line plot] .left-code[ Create a dataset first: ```r mean_yield <- county_yield %>% group_by(year) %>% summarize( corn_yield = mean(corn_yield, na.rm = TRUE) ) %>% filter(!is.na(year)) ``` Create a line plot: ```r ggplot(data = mean_yield) + geom_line(aes(x = year, y = corn_yield)) ``` + `geom_line()` needs `x` and `y`. ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/line-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Boxplot] .left-code[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield) ) ``` + `geom_boxplot()` needs `x` and `y` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/box-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Bar plot] .left-code[ ```r ggplot(data = mean_yield) + geom_bar( aes( x = year, y = corn_yield ), stat = "identity" ) ``` + `geom_bar()` needs `x` and `y` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/bar-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Modifying how figures look .left-full[ All the elements in the figures we have created so far are in black and white. You can change how figure elements look by providing options inside `geom_*()`. Here are the list of options to control the aesthetics of figures: + fill + color + size + shape + linetype Elements of figures that you can modify differ by `geom` types The same element name can mean different things based on `geom` types ] --- count: false # Scatter Plot .panel1-fig-scatter-f-auto[ ```r *g_fig # BREAK ``` ] .panel2-fig-scatter-f-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/fig-scatter-f_auto_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Scatter Plot .panel1-fig-scatter-f-auto[ ```r g_fig + # BREAK * geom_point( * aes(x = d3_5_9, y = corn_yield), * color = "red", # BREAK2 * size = 0.7, # BREAK3 * shape = 0 # BREAK4 * ) ``` ] .panel2-fig-scatter-f-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/fig-scatter-f_auto_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-fig-scatter-f-auto { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-fig-scatter-f-auto { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-fig-scatter-f-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "red"`: makes all the squares red + `size = 0.5`: makes the size of the squares smaller + `shape = 0`: change the shape of the points (find other shapes [here](http://www.sthda.com/english/wiki/ggplot2-point-shapes)) --- count: false # Histogram .panel1-hist-f-auto[ ```r *g_fig ``` ] .panel2-hist-f-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/hist-f_auto_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Histogram .panel1-hist-f-auto[ ```r g_fig + * geom_histogram( * aes(x = corn_yield), * color = "blue", * fill = "green", * size = 2, * shape = 2 * ) ``` ] .panel2-hist-f-auto[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/hist-f_auto_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-hist-f-auto { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-hist-f-auto { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-hist-f-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "blue"`: makes all the boundary of the bars blue + `fill = "green"`: makes the fill of the bars green + `size = 2`: makes the line width of the boundary of the bars thicker + `shape = 2`: does nothing --- count: false # Box Plot .panel1-box-f-non_seq[ ```r ggplot(data = county_yield) + # BREAK geom_boxplot( aes(x = factor(year), y = corn_yield), color = "red", # BREAK2 fill = "orange", # BREAK3 size = 0.2, # BREAK4 shape = 1 # BREAK5 ) ``` ] .panel2-box-f-non_seq[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/box-f_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-box-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-box-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-box-f-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "blue"`: makes all the boundary of the boxes red + `fill = "orange"`: makes the fill of the boxes orange + `size = 0.2`: makes the line width of the boundary of the boxes thinner + `shape = 1`: does nothing --- count: false # Line Plot .panel1-line-f-non_seq[ ```r ggplot(data = mean_yield) + # BREAK1 geom_line( aes(x = year, y = corn_yield), color = "blue", # BREAK2 size = 1.5, # BREAK3 fill = "red", # BREAK4 linetype = "dotted", # BREAK5 ) ``` ] .panel2-line-f-non_seq[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/line-f_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-line-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-line-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-line-f-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "blue"`: makes the line blue + `size = 1.5`: makes the line thicker + `fill = "red"`: does nothing + `linetype = "dotted"`: makes the line dotted --- # Exercises .panelset[ .panel[.panel-name[Instruction] This exercise use the `diamonds` dataset from the `ggplot2()` package. First, load the dataset and extract observations with `Premium` cut whose color is one of `E`, `I`, and `F`: ```r data("diamonds") premium <- diamonds %>% filter( cut == "Premium" & color %in% c("E", "I", "F") ) #--- take a look ---# premium ``` ``` ## # A tibble: 6,096 × 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 2 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 ## 3 0.22 Premium F SI1 60.4 61 342 3.88 3.84 2.33 ## 4 0.2 Premium E SI2 60.2 62 345 3.79 3.75 2.27 ## 5 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68 ## 6 0.24 Premium I VS1 62.5 57 355 3.97 3.94 2.47 ## 7 0.29 Premium F SI1 62.4 58 403 4.24 4.26 2.65 ## 8 0.22 Premium E VS2 61.6 58 404 3.93 3.89 2.41 ## 9 0.42 Premium I SI2 61.5 59 552 4.78 4.84 2.96 ## 10 0.24 Premium E VVS1 60.7 58 553 4.01 4.03 2.44 ## # … with 6,086 more rows ``` ] .panel[.panel-name[Exercise 1] <br> Using `carat` and `price` variables from `premium`, generate the figure below: <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/diamond_fig_1-1.png" width="50%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Exercise 2] <br> Using `price` variables from `premium`, generate a histogram of `price` shown below: <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/diamond_fig_2-1.png" width="50%" style="display: block; margin: auto;" /> ] ] <!-- #========================================= # Placing more information #========================================= --> --- class: inverse, center, middle name: more-info # Placing more information in one figure <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Placing more information in one figure .panelset[ .panel[.panel-name[Motivation] <br> So far, we have learned how to create popular types of figures. We can make a figure much more informative by making its aesthetics data-dependent. For example, suppose you are interested in comparing the history of irrigated corn yield by state in a line plot. So, you want to create a line for each state and make the lines distinguishable so the readers know which line is for which state like this: <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/more-info-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[How] .left-full[ We can make the aesthetics of a figure data-dependent by specifying which variable you use for aesthetics differentiation <span style="color:red"> INSIDE </span>`aes()`. Here is an example: <code class ='r hljs remark-code'>ggplot(data = county_yield_mean) +<br> geom_line(<br> aes(y = corn_yield, x = year, <span style='background-color:#ffff7f'>color = state_name</span>)<br> )</code> In this code, `color = state_name` is inside `aes()` and it tells R to divide the data into the groups of State and draw a line by `state_name` (by state) where the lines are color-differentiated. A legend is automatically generated. ] ] .panel[.panel-name[Let's do it] <br> .left-code[ Create a data set of corn yield by state-year first: ```r county_yield_mean <- county_yield %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield, na.rm = T)) ``` Create a plot: ```r ggplot(data = county_yield_mean) + geom_line( aes( y = corn_yield, x = year, * color = state_name ) ) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/do-it-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- count: false # More examples: Density Plot .panel1-density-more-non_seq[ ```r ggplot(data = county_yield_mean) + # BREAK geom_density( aes( x = corn_yield, fill = state_name # BREAK2 ), alpha = 0.3 ) ``` ] .panel2-density-more-non_seq[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/density-more_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-density-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-density-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-density-more-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # More examples: Boxplot .panel1-box-more-non_seq[ ```r county_yield %>% filter(state_name %in% c("Nebraska", "Kansas")) %>% ggplot(data = .) + geom_boxplot( aes( x = factor(year), y = corn_yield, fill = state_name # BREAK2 ) ) ``` ] .panel2-box-more-non_seq[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/box-more_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-box-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-box-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-box-more-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # More examples: Scatter Plot .panel1-scatter-more-non_seq[ ```r county_yield %>% filter(state_name %in% c("Nebraska", "Kansas")) %>% ggplot(data = .) + # BREAK geom_point( aes( x = d3_5_9, y = corn_yield, color = state_name, # BREAK2 shape = state_name # BREAK3 ), size = 0.7 ) ``` ] .panel2-scatter-more-non_seq[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/scatter-more_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-scatter-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-scatter-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-scatter-more-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Exercises .panelset[ .panel[.panel-name[Exercise 1] <br> Using `premium`, create a scatter plot of `price` (y-axis) against `depth` (x-axis) by `clarity` as shown below: <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/ex_2_1-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Exercise 2] <br> Using `premium`, create density plots of `carat` by `color` as shown below (set `alpha` to 0.5): <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/ex_2_2-1.png" width="60%" style="display: block; margin: auto;" /> ] ] <!-- #========================================= # Faceting #========================================= --> --- class: inverse, center, middle name: faceting # Faceting <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Faceting: Basics .panelset[ .panel[.panel-name[Motivation] Sometimes, you would like to visualize information across groups on separate panels. .left5[ Too much information in one panel? <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/box_all-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ On separate panels (faceting)? <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/box-faceted-1-1.png" width="100%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[How] .left-full[ We can make faceted figures by adding either `facet_wrap` or `facet_grid()` in which you specify which variable to use for faceting. Here is an example: <code class ='r hljs remark-code'>ggplot(data = county_yield) +<br> geom_boxplot(<br> aes(x = factor(year), y = corn_yield)<br> ) +<br> <span style='background-color:#ffff7f'>facet_wrap(state_name ~ .)</span></code> In this code, `facet_wrap(state_name ~ .)` is added to a simple boxplot, which tells R to make a boxplot by `state_name` (state). What does `~ .` do? ] ] ] --- count: false # Faceting: an Example .panel1-facet-ex-user[ ```r *ggplot(data = county_yield) + * geom_boxplot( * aes(x = factor(year), y = corn_yield) * ) + # BREAK * facet_wrap(state_name ~ .) ``` ] .panel2-facet-ex-user[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/facet-ex_user_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-facet-ex-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-facet-ex-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-facet-ex-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Faceting: Two-way .panelset[ .panel[.panel-name[two-way faceting] .left-full[ Two-way faceting will + divide the data into groups where each group has a unique combination of the two faceting variables + create a plot for each group ```r ggplot(data = county_yield) + geom_histogram( aes(x = corn_yield) ) + * facet_wrap(state_name ~ year) ``` This code will create a histogram of corn yield for each of the unique state-year combination. ] ] .panel[.panel-name[Figure 2] .left-code[ Filter `county_yield` to those in 2017 and 2018. ```r county_yield_s <- county_yield %>% filter(year %in% c(2017, 2018)) ``` Create a faceted density plots. ```r ggplot(data = county_yield_s) + geom_histogram( aes(x = corn_yield) ) + facet_wrap(state_name ~ year) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/two-ex-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Faceting with `facet_grid()` .panelset[ .panel[.panel-name[compare] .left5[ `facet_wrap` ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_wrap(state_name ~ year) ``` <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/wrap-ex-1-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ `facet_grid` ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(state_name ~ year) ``` <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/frig-ex-1-1.png" width="100%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[facet_grid()] .left-code[ Unlike `facet_wrap()`, which side you put faceting variables matters a lot. + left hand side: rows + right hand side: columns ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(state_name ~ year) ``` In the code above, `state_name` values become the rows, and `year` values become columns. ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/grid-matter-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[order] .left5[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(state_name ~ year) ``` <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/wrap-left-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(year ~ state_name) ``` <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/grid-right-1.png" width="100%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[scale] .left-full[ `facet_grid()` allows + the figures in different columns to have different scales for the x-axis (figures in the same column have the same scale for the x-axis) + the figures in different rows to have different scales for the y-axis (figures in the same rows have the same scale for the x-axis) ] ] <!-- panel ends here --> .panel[.panel-name[free x] .left-code[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * scales = "free_x" ) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/free-x-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[free y] .left-code[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * scales = "free_y" ) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/free-y-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[both free] .left-code[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * scales = "free" ) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/both-free-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[strip label] .left-code[ You can change strip labels using the `labeller = ` option inside `facet_grid()` (or `facet_wrap()`). To do this, you need to create a vector of labels you want where its element names are the corresponding values of the faceting variables. Define labels first: ```r #--- the vector values are new strip labels ---# year_labels <- paste("Year = ", c(2017, 2018)) #--- the element names are the values to replace ---# names(year_labels) <- c("2017", "2018") ``` Create a faceted figure with new labels: ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * labeller = labeller(year = year_labels) ) ``` By `year = year_labels`, you are applying `year_labels` to the faceting variable `year`. ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/strip-label-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[or] .left-code[ Or, you could just create a variable that has the values you want to use as labels and use it as a faceting variable: ```r county_yield_s %>% mutate( * year_text = paste0("Year = ", year) ) %>% ggplot(data = .) + geom_histogram(aes(x = corn_yield)) + facet_grid( * state_name ~ year_text ) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/alt-strip-label-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Exercises .panelset[ .panel[.panel-name[Exercise 1] <br> Using `premium`, create scatter plots of `price` (y-axis) against `carat` (x-axis) by `color` on separate panels as shown below: <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/ex_3_1-1.png" width="50%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Exercise 2] <br> Using premium, create histogram of `carat` by `color` and `clarity` on separate panels as shown below: <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/ex_3_2-1.png" width="50%" style="display: block; margin: auto;" /> ] ] --- # Density plot, histogram, boxplot .panelset[ .panel[.panel-name[density-histogram] Density plots and histograms convey basically the same information. .content-box-green[**Key difference**]: + Density plots are normalized version of histograms so that the area under them are 1. + Histograms convey the information about the number of observations in addition to the distribution .left5[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/unnamed-chunk-14-1.png" width="80%" style="display: block; margin: auto;" /> ] .right5[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/unnamed-chunk-15-1.png" width="80%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[density-box] .content-box-green[**Key difference**]: + a density-plot provides complete information about the distribution of a single variable, but important summary statistics like mean or media are not present + a box in box-plot provides incomplete information about the distribution of a single variable, but it takes up much less space in a figure .left3[ For this reason, boxplots are particularly useful when it is desirable to place the distribution information of a single variable across groups and over time in a single panel (see the figure to the right as an example). You can convey similar information using density plots faceted by year. But, it is often the case that full distribution information is not necessary. ] .right7[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] <!-- panel set ends here --> --- # Preparing datasets .left-full[ We have seen + figures where its main elements (points, lines, boxes, etc) are made color differentiated (e.g., with `aes(color = var)` inside the `geom_*()` function) + faceted figures .content-box-blue[.red[Important]: the dataset has to be in long format to create these types of figures!!] <br> For example consider the following dataset in a wide format: ``` ## county_code state_name 2000 2001 ## 1: 001 Nebraska 161 185 ## 2: 003 Nebraska 159 165 ## 3: 005 Nebraska 130 135 ## 4: 007 Kansas 160 169 ## 5: 007 Nebraska 125 143 ## --- ## 152: 191 Kansas 162 118 ## 153: 193 Kansas 169 197 ## 154: 195 Kansas 122 168 ## 155: 199 Kansas 161 158 ## 156: 203 Kansas 167 170 ``` This dataset has county-level yields for Nebraska, Colorado, and Kansas stored in variables named `2000` and `2001` (they themselves represent years). Imagine creating boxplots of corn yield fill color-differentiated by state and faceted by year. You will have trouble with specifying `facet_grid()` because you do not have a single variable that represents `year`. You will find that reshaping wide datasets using `pivot_longer()` is very useful in creating figures. ] <!-- #========================================= # Other useful geom_* #========================================= --> --- class: inverse, center, middle name: other-geoms # Other supplementary `geom_*()` <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Other supplementary `geom_*()` .panelset[ .panel[.panel-name[geom_*] .left-full[ Here are the list of useful **geom_**. + `geom_vline()`: draw a vertical line + `geom_hline()`: draw a horizontal line + `geom_abline()`: draw a line with the specified intercept and slope + `geom_smooth()`: draw an OLS-estimated regression line (other regression methods available) + `geom_ribbon()`: create a shaded area + `geom_text()` and `annotate()`: add texts in the figure We will use `g_fig_scatter` to illustrate how these functions work. ] ] .panel[.panel-name[vline and hline] .left-code[ ```r g_fig_scatter + geom_vline( xintercept = 10, color = "blue" ) + geom_hline( yintercept = 100, color = "red" ) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/hv-line-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[abline] .left-code[ ```r g_fig_scatter + geom_abline( #--- a ---# intercept = 50, #--- b ---# slope = 4, color = "blue" ) ``` `$$y = a + b\times x$$` + `intercept`: `\(a\)` + `slope`: `\(b\)` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/ab-line-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[smooth] .left-code[ ```r g_fig_scatter + geom_smooth( aes( y = corn_yield, x = d3_5_9 ) ) ``` Also try ```r g_fig_scatter + geom_smooth( aes( y = corn_yield, x = d3_5_9 ), method = "lm" ) ``` ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/smooth-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[ribbon] .left-code[ ```r g_fig_scatter + geom_ribbon( aes( x = d3_5_9, ymin = 100, ymax = 200 ), fill = "green", alpha = 0.3 ) ``` + `ymin`: lower bound of the ribbon + `ymax`: upper bound of the ribbon Useful when drawing confidence intervals. ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/ribbon-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[text] .left-code[ ```r g_fig_scatter + geom_text( aes( x = d3_5_9, y = corn_yield, label = state_name, ) ) ``` + `x`, `y`: position of where texts are placed + `label`: variable to print ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/text-ex-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[annotate] .left-code[ ```r g_fig_scatter + annotate( "text", x = 10, y = 50, label = "Drought hurts \n a lot!!", size = 3, color = "red" ) ``` + `x`: where on x-axis + `y`: where on y-axis + `label`: text to print (\n break the line) + size: font size ] .right-plot[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/annotate-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Multiple datasets in one figure .panelset[ .panel[.panel-name[multiple datasets] .left-full[ <span style="color:red">Important: </span>`data = county_yield` declared inside `ggplot()` applies to ALL the subsequent `geom_*()`s unless overwritten locally inside individual `geom_*()`s. Try this: ```r ggplot() + geom_point(data = county_yield, aes(y = corn_yield, x = d3_5_9)) + geom_smooth(aes(y = corn_yield, x = d3_5_9)) ``` It is easy to use multiple datasets inside a single `ggplot` object (or a figure). You just need to specify what dataset to use locally inside individual `geom_*()`s. Let's see how this works using an example of drawing the confidence intervals around the regression lie of the following regression: <br> `$$corn\_yield = \beta_0 + \beta_1 d3\_5\_9 + v$$` ] ] .panel[.panel-name[Preparation] .left-full[ ```r #--- regression ---# reg <- lm(corn_yield ~ d3_5_9, data = county_yield) #--- find confidence interval ---# min_d3 <- county_yield$d3_5_9 %>% min(na.rm = TRUE) # minimum d3 observed max_d3 <- county_yield$d3_5_9 %>% max(na.rm = TRUE) # maximum d3 observed eval_points <- data.frame(d3_5_9 = seq(min_d3, max_d3, length = 1000)) # evaluation points ci_bound <- predict(reg, newdata = eval_points, interval = "confidence", level = 0.9) # upper and lower bound ci_bound_data <- cbind(eval_points, ci_bound) # combine evaluation points and ci ``` ```r head(ci_bound_data) ``` ``` ## d3_5_9 fit lwr upr ## 1 0.00000000 180.4965 179.5620 181.4311 ## 2 0.02202202 180.4657 179.5332 181.3981 ## 3 0.04404404 180.4349 179.5045 181.3652 ## 4 0.06606607 180.4041 179.4758 181.3324 ## 5 0.08808809 180.3733 179.4470 181.2995 ## 6 0.11011011 180.3424 179.4182 181.2667 ``` ] ] <!-- panel ends here --> ] --- count: false # Multiple datasets in one figure .panel1-mult-geom-user[ ```r *ggplot() + #--- scatter plot ---# * geom_point( * data = county_yield, * aes(y = corn_yield, x = d3_5_9) * ) + # BREAK #--- regression line ---# * geom_line( * data = ci_bound_data, * aes(x = d3_5_9, y = fit), * color = "blue", * size = 1.2 * ) + # BREAK #--- confidence interval ---# * geom_ribbon( * data = ci_bound_data, * aes(x = d3_5_9, ymin = lwr, ymax = upr), * fill = "red", * alpha = 0.4 * ) ``` ] .panel2-mult-geom-user[ <img src="data:image/png;base64,#data_visualization_basics_x_files/figure-html/mult-geom_user_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-mult-geom-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mult-geom-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mult-geom-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style>